A Write-friendly Store for SILT: Part II

Indexing for reduced per-key memory consumption#

Since this store contains the least amount of keys, the overall impact of using the hash table on per-key memory consumption will be low. However, since reduced per-key memory consumption is our goal, we should find ways to reduce the impact of the above hash table. We can reduce per-key memory used by the write-friendly store by:

  1. Ensuring high occupancy of the hash table

  2. Using a partial key instead of the full key

Partial-key cuckoo hashing for higher occupancy#

The technique we will use for hashing is partial-key cuckoo hashing. The partial-key means we will use a part of the key.

Hash functions#

Cuckoo hashing requires two hash functions, h1, and h2. Both map a key K to two candidate buckets, h1(K) and h2(K), on the same table.

Insertion algorithm#

To insert a new key, K1, we will compute its candidate buckets h1(K1) and h2(K1) and check if one, both, or none of the two buckets are empty.

  1. If both buckets are empty, store K1 in one of the buckets.

  2. If one bucket is empty, check if K1 is stored in the occupied bucket.

    1. If K1 is already stored, update the offset.

    2. Otherwise, store K1 in the empty bucket.

  3. If neither bucket is empty, check if K1 is stored in one of the occupied buckets.

    1. If K1 is already stored, update the offset.

    2. Otherwise, K1 displaces K2 in one of the K1 candidate buckets. We will insert K2 into its other candidate bucket.

      1. If the other candidate bucket for K2 is empty, it is stored there.

      2. If the other candidate bucket is not empty, K2 displaces K3, which resides in its other candidate bucket.

      3. We will repeat the process in 3(II) for K3 until we find an empty bucket for a limited number of displacements. If we reach our allowed number of displacements and cannot find an empty bucket, we will initialize a new write-friendly store and store the displaced key in one of its candidate buckets in the new store.

Created with Fabric.js 3.6.6
Compute the candidate buckets for K2 using hash functions h1() and h2()

1 of 8

Created with Fabric.js 3.6.6
K2's first candidate bucket, h1(K2), is empty, and K2 is not stored in h2(K2)—this is case 2(II). We will store K2 in the empty candidate bucket.

2 of 8

Created with Fabric.js 3.6.6
For the next key, K7, compute its candidate buckets

3 of 8

Created with Fabric.js 3.6.6
Both the candidate buckets, h1(K7) and h2(K7), are empty—this is case 1. We will store K7 in any of the candidate buckets. We have chosen h2(K7).

4 of 8

Created with Fabric.js 3.6.6
In the arriving requests, the next key is K3. The first step for all (PUT, GET, DELETE) requests for the write-friendly store is to compute the key's candidate buckets using the hash functions h1() and h2().

5 of 8

Created with Fabric.js 3.6.6
One candidate bucket is empty, but K3 is already stored in h1(K3)—this is case 2(I). We will update K3's offset in the log.

6 of 8

Created with Fabric.js 3.6.6
We have skipped the insertion of K6 in the slides. It is stored in its first candidate bucket, h1(K6). K2 is the next key, and we will compute its candidate buckets.

7 of 8

Created with Fabric.js 3.6.6
K2's candidate buckets, h1(K2) and h2(K2), are occupied and K2 is stored in one of them, h1(K2)—case 3(I). We will update K2's offset.

8 of 8

We will repeat the process of displacing keys for a limited number of attempts. When that number is reached, we will initialize a new write-friendly store and move all entries in the current store to the next key-value store in the pipeline. From this point onwards, the new write-friendly store will cater to arriving PUT and DELETE requests. The configured limit of the number of keys displaced is our way of determining the level of occupancy we would want for our hash table. The number of displacements required to insert a key increases as the occupancy of the hash table increases and becomes very high in the late range (90%+). Based on our hardware and other configurations, we can determine an acceptable number of attempts before we declare the current write-friendly store too occupied to allow for an acceptable cost of inserts.

Let's look at how limiting the number of displacements works. In the example below, the number of allowed displacements to insert a key is 3.

Created with Fabric.js 3.6.6
We will compute the candidate buckets for K9 using hash functions h1() and h2()

1 of 10

Created with Fabric.js 3.6.6
K9's candidate buckets, h1(K9) and h2(K9), are occupied and neither has K9 stored. We will need to displace a key in one of the candidate buckets—case 3(II). We will displace K4 in h1(K4), equal to h2(K9), to accommodate K9.

2 of 10

Created with Fabric.js 3.6.6
We’ve made one displacement. K7 occupies K4's other candidate bucket, h2(K4)—h2(K4) is the same as h2(K7). We will displace K7 and store K4.

3 of 10

Created with Fabric.js 3.6.6
We’ve made two displacements. K7's other candidate bucket, h1(K7), is empty. We will store K7 in h1(K7). Now the insertion for K9 is complete. We required two displacements to insert K9, and our limit is 3.

4 of 10

Created with Fabric.js 3.6.6
K1 is the next key in the arriving requests. We will compute its candidate buckets.

5 of 10

Created with Fabric.js 3.6.6
K1's candidate buckets are occupied, and neither has K1 stored. We will displace one of the keys in K1's candidate buckets—case 3(II), and we will displace K2 in h1(K2), also equal to h1(K1).

6 of 10

Created with Fabric.js 3.6.6
We’ve made one displacement. K3 occupies K2's other candidate bucket, h2(K2), also equal to h1(K3). We will displace K3 and store K2.

7 of 10

Created with Fabric.js 3.6.6
We’ve made two displacements. K6 resides in K3's other candidate bucket, h2(K3), equal to h1(K6). We will displace K6 and store K3.

8 of 10

Created with Fabric.js 3.6.6
We’ve made three displacements. We have reached our allowed number of displacements. The only way this insertion of K1 ends without initializing a new write-friendly store is if K6's other candidate bucket, h2(K6), is empty. Since h2(K6) is not empty—K4 resides in it—we will have to initialize a new store.

9 of 10

Created with Fabric.js 3.6.6
The first key we insert into the new store is the key for which we could not find a bucket in the allowed number of displacements. K6 will be stored in any of its candidate buckets with offset 0. In storage, we will copy the entry for K6 from the old write-friendly log to the new write-friendly log as the first entry.

10 of 10

Partial-key hashing#

We will save memory by not storing the entire key in the hash table. Instead, we will use a tag for each key in the hash table. We will mark a key's entry in the hash table with its tag by storing the tag along with its offset (position in the storage log). Every key-value entry in the write-friendly store will have the following:

  1. Complete key and value in the storage log

  2. Hash table entry containing:

    1. The key's tag

    2. The key's offset

We will compute each key's tag using a hash function and two non-overlapping key fragments from the lower-order bits of the key. Our two hash functions for cuckoo hashing essentially use only one hash function at the core but apply it to the two key fragments of a key.

To formalize, for a key K,

  • h1 applies the hash function to the first key fragment to compute the index of the first candidate bucket of that key. We will denote this index as h1(K) for the key K.

  • h2 does the same to the second key fragment to compute the index of the second candidate bucket. We will denote this index as h2(K) for the key K.

While the technique above reduces per-key memory consumption, it raises a problem. For displacing a key and storing it in its alternate bucket, we require computing the index for the alternate bucket. We cannot do this without looking up the complete key stored in storage. If we look up a key for every displacement, this can result in multiple disk seeks/reads for a single PUT or DELETE request and, consequently, a higher read amplification.

We can overcome the above problem by storing the tag for the alternate bucket instead of the current bucket in which the key is stored. For example, we insert the key K1 into its candidate bucket h1(K1), then we will mark it with h2(K1). If we need to move K1 to its alternate bucket in the future, we do not need to compute h2(K1). Now, if we need to store another key, say K2 in h1(K1) since it is the second candidate bucket for K2; more specifically, h2(K2) equals h1(K1). Since we mark entries by their alternate bucket, we do not require to look up either key to compute the alternate bucket. We can locate h2(K1) because we have marked an entry for K1 with this tag. We will store K2 in h2(K2). Now we need to mark the K1 entry with the tag h1(K1):

  • If K2 is for a new request, meaning it is not itself being displaced by another key, then we will have values for both its candidate buckets h1(K2) and h2(K2) from the initial computation of candidate buckets. In this case, we will use the value from the initial computation of candidate buckets for new requests and mark K1 with h2(K2) (which is also equal to h1(K1)) and mark K2 with h1(K2).

  • If K2 is not a new request, meaning another key is displacing it, then it is marked with h2(K2) (equal to h1(K1)), and we will use this value to mark K1. We will mark K2 with the tag on the key displacing it.

Let’s look at how this will look using our first displacement example.

Created with Fabric.js 3.6.6
This is the insertion for K9, the key for a new request. We will compute the candidate buckets for K9 using hash functions h1() and h2(). Also, note how we have stored all keys by marking them with their alternate buckets (see the first slide of the previous slide deck for comparison).

1 of 4

Created with Fabric.js 3.6.6
K9's candidate buckets, h1(K9) and h2(K9), are occupied and neither has K9 stored. We will need to displace a key in one of the candidate buckets—case 3(II). We will displace K4 in h1(K4), equal to h2(K9), to accommodate K9. When storing K9 in h2(K9), we will mark it with h1(K9). We will store the computed value of h2(K9) to mark K4 when we store it in its alternate bucket, h2(K4). When in h2(K4), K4's alternate bucket will be h2(K9) (=h1(K4)).

2 of 4

Created with Fabric.js 3.6.6
So far, we have made one displacement. K7 occupies K4's other candidate bucket, h2(K4)—h2(K4) is the same as h2(K7). We will displace K7 and store K4. We will keep the tag on K4, h2(K4), to mark K7 when storing it in its alternate bucket, h1(K7). When in h1(K7), its alternate bucket will be h2(K4) (=h2(K7)).

3 of 4

Created with Fabric.js 3.6.6
So far, we have made two displacements. K7's other candidate bucket, h1(K7), is empty. We will store K7 in h1(K7). When storing K4 in h2(K4), we have marked it with h2(K9) (=h1(K4)), and when storing K7 in h1(K7), we have marked it with h2(K4) (=h2(K7)). We had stored both values previously for this purpose. We required no storage seeks for this since we did not need any of the full keys of the displaced keys.

4 of 4

Associativity#

We can improve occupancy by increasing the associativity of the cuckoo hash table, allowing us to store more than one entry in a single bucket.

A Write-friendly Store for SILT: Part I

A Write-friendly Store for SILT: Part III